##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## -- Attaching packages ----------------------------------- tidyverse 1.3.0 --
## v tibble 3.0.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.5.0
## v purrr 0.3.4
## -- Conflicts -------------------------------------- tidyverse_conflicts() --
## x lubridate::as.difftime() masks base::as.difftime()
## x readr::col_factor() masks scales::col_factor()
## x lubridate::date() masks base::date()
## x purrr::discard() masks scales::discard()
## x dplyr::filter() masks stats::filter()
## x lubridate::intersect() masks base::intersect()
## x dplyr::lag() masks stats::lag()
## x lubridate::setdiff() masks base::setdiff()
## x lubridate::union() masks base::union()
The first step is to important all of the data.
Then because we have very messy and unformatted data - particularly the dates and times - we need to clean this up such that we can process the data correctly. Therefore, we use lubridate to convert out data into an understandable format.
## # A tibble: 4,534,327 x 9
## Date_Time Date Year Month Day DayOfWeek Lat Lon Base
## <chr> <fct> <fct> <fct> <fct> <ord> <dbl> <dbl> <chr>
## 1 04/01/2014 00:11 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## 2 04/01/2014 00:17 2014-04-01 2014 April 01 Tue 40.7 -74.0 B02512
## 3 04/01/2014 00:21 2014-04-01 2014 April 01 Tue 40.7 -74.0 B02512
## 4 04/01/2014 00:28 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## 5 04/01/2014 00:33 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## 6 04/01/2014 00:33 2014-04-01 2014 April 01 Tue 40.7 -74.0 B02512
## 7 04/01/2014 00:39 2014-04-01 2014 April 01 Tue 40.7 -74.0 B02512
## 8 04/01/2014 00:45 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## 9 04/01/2014 00:55 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## 10 04/01/2014 01:01 2014-04-01 2014 April 01 Tue 40.8 -74.0 B02512
## # ... with 4,534,317 more rows
Now we want to see how the number of trips made by Uber varies throughout the week. Therefore, we find make a wonderful data displaying this.
Unfortunately, number arent as easy to digest, so lets plot this on a graph.
Alright, but its still kind of hard to understand when its busiest for Uber, so lets try this out…
## `summarise()` regrouping output by 'DayOfWeek' (override with `.groups` argument)